Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More aggressively mask user code errors when masking enabled #27183

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

benpankow
Copy link
Member

@benpankow benpankow commented Jan 16, 2025

Summary

Test Plan

More unit tests.

@benpankow
Copy link
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@benpankow benpankow requested a review from gibsondan January 16, 2025 23:39
@benpankow benpankow changed the title aggressively mask user code errors More aggressively mask user code errors when masking enabled Jan 17, 2025
@benpankow benpankow marked this pull request as ready for review January 17, 2025 19:48
@benpankow benpankow force-pushed the benpankow/user-code-errors-mask-aggressively branch from 333ee23 to c6aef4f Compare January 17, 2025 19:49
@gibsondan gibsondan requested a review from alangenfeld January 17, 2025 19:50
Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a summary is warranted for this PR given its complexity

Comment on lines +100 to +128
error_id_by_exception: ContextVar[Mapping[int, str]] = ContextVar(
"error_id_by_exception", default={}
)


@contextlib.contextmanager
def redact_user_stacktrace_if_enabled():
"""Context manager which, if a user has enabled redacting user code errors, logs exceptions raised from within,
and clears the stacktrace from the exception. It also marks the exception to be redacted if it was to be persisted
or otherwise serialized to be sent to Dagster Plus. This is useful for preventing sensitive information from
being leaked in error messages.
"""
if not _should_redact_user_code_error():
yield
else:
try:
yield
except BaseException as e:
exc_info = sys.exc_info()

# Generate a unique error ID for this error, or re-use an existing one
# if this error has already been seen
existing_error_id = error_id_by_exception.get().get(id(e))

if not existing_error_id:
error_id = str(uuid.uuid4())

# Track the error ID for this exception so we can redact it later
error_id_by_exception.set({**error_id_by_exception.get(), id(e): error_id})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way you are using the context var here is equivalent to just having a process global dict. What exactly is the intention here and do any of the existing tests validate that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe the goal is to increase the set of cases that this logic can handle (all kinds of exceptions besides DagsterUserCodeExecutionError can be emitted within user code, like KeyboardInterrupt or SystemExit or other DagsterError subclasses) while still only triggering the redaction if the exception was actually raised within a op_execution_error_boundary or user_code_error_boundary. I don't have a strong opinion about global dict vs. contextvar

Copy link
Member

@gibsondan gibsondan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be out next week so may lean on you alex to do the final accept once you have full context, but this broadly makes sense to me

Comment on lines +100 to +102
error_id_by_exception: ContextVar[Mapping[int, str]] = ContextVar(
"error_id_by_exception", default={}
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name here should probably have 'redacted' in it: redacted_user_code_error_id_by_exception?

Comment on lines +100 to +128
error_id_by_exception: ContextVar[Mapping[int, str]] = ContextVar(
"error_id_by_exception", default={}
)


@contextlib.contextmanager
def redact_user_stacktrace_if_enabled():
"""Context manager which, if a user has enabled redacting user code errors, logs exceptions raised from within,
and clears the stacktrace from the exception. It also marks the exception to be redacted if it was to be persisted
or otherwise serialized to be sent to Dagster Plus. This is useful for preventing sensitive information from
being leaked in error messages.
"""
if not _should_redact_user_code_error():
yield
else:
try:
yield
except BaseException as e:
exc_info = sys.exc_info()

# Generate a unique error ID for this error, or re-use an existing one
# if this error has already been seen
existing_error_id = error_id_by_exception.get().get(id(e))

if not existing_error_id:
error_id = str(uuid.uuid4())

# Track the error ID for this exception so we can redact it later
error_id_by_exception.set({**error_id_by_exception.get(), id(e): error_id})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe the goal is to increase the set of cases that this logic can handle (all kinds of exceptions besides DagsterUserCodeExecutionError can be emitted within user code, like KeyboardInterrupt or SystemExit or other DagsterError subclasses) while still only triggering the redaction if the exception was actually raised within a op_execution_error_boundary or user_code_error_boundary. I don't have a strong opinion about global dict vs. contextvar

Comment on lines +166 to +179
if isinstance(e, DagsterUserCodeExecutionError):
return SerializableErrorInfo(
message=(
f"Error occurred during user code execution, error ID {err_id}. "
"The error has been masked to prevent leaking sensitive information. "
"Search in logs for this error ID for more details."
),
stack=[],
cls_name="DagsterRedactedUserCodeError",
cause=None,
context=None,
)
else:
tb_exc = traceback.TracebackException(exc_type, e, tb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth explaining the difference between these two cases - with user code errors, you don't even want to show the message - but with other errors (framework errors or interrupts or Failure / RetryRequested raised within the error boundary), the message is not sensitive and can be displayed for clarity, but the traceback is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants